replace tl.libdevice.llrint with tl.extra.cuda.libdevice.rint by bjmsong · Pull Request #1372 · bitsandbytes-foundation/bitsandbytes

bjmsong · 2024-09-26T03:39:46Z

No description provided.

TimDettmers

PR Review: #1372 — Replace tl.libdevice.llrint with tl.extra.cuda.libdevice.rint

[bug-fix] Triton API migration: updates tl.libdevice.llrint calls in 3 Triton quantization kernels to use the new tl.extra.cuda.libdevice path. The tl.libdevice namespace was removed in Triton 3.x, so this addresses a real compatibility issue.

Blocking issues (2):

Wrong function: rint instead of llrint — The PR replaces tl.libdevice.llrint with tl.extra.cuda.libdevice.rint, but these are semantically different functions. llrint rounds to the nearest integer and returns an integer type (long long). rint rounds to the nearest integer but returns a float. Since the result is being stored to an int8 output tensor, the practical difference may be minor (implicit float-to-int cast), but tl.extra.cuda.libdevice.llrint exists and is the correct 1:1 replacement. Using rint is an unnecessary semantic change that could introduce subtle numerical differences in edge cases (e.g., values exactly at 0.5 boundaries, NaN/Inf handling).
No regression test — This is a bug fix that changes quantization kernel behavior. There should be a test that exercises the Triton quantization path and verifies correctness. The bitsandbytes/triton/ kernels are guarded behind is_triton_available(), so existing tests may not exercise them. A minimal test confirming quantize_rowwise, quantize_global, and quantize_columnwise_and_transpose produce correct output with the updated API would be valuable.

Additional note: This PR is from September 2024 and is significantly behind main. A rebase will likely be needed. The Triton directory has had substantial changes since then (XPU triton optimizers, compatibility guards, etc.).

Security: Clear (trivial API path change, no new imports, no suspicious patterns)
Downstream impact: None (internal Triton kernels, not part of public API)
Tests: Missing — no test covers the Triton quantization path change
CI: Not triggered (fork PR — maintainer must approve workflow run)
Serialization: Not affected
Cross-PR conflicts: None detected

TimDettmers · 2026-02-16T18:13:21Z

bitsandbytes/triton/quantize_rowwise.py

        abs_x = tl.abs(x)
        max_val = tl.max(tl.where(row_mask, abs_x, 0), axis=0)
-        output = tl.libdevice.llrint(127.0 * (x / max_val))
+        output = tl.extra.cuda.libdevice.rint(127.0 * (x / max_val))


tl.extra.cuda.libdevice.rint is not the correct 1:1 replacement for tl.libdevice.llrint. llrint rounds to nearest integer and returns an integer type; rint rounds to nearest integer but returns a float. Since tl.extra.cuda.libdevice.llrint exists in modern Triton, this should use tl.extra.cuda.libdevice.llrint instead to preserve the original semantics.

TimDettmers · 2026-02-16T18:13:21Z

bitsandbytes/triton/quantize_global.py

        x = tl.load(x_ptr + offsets, mask=mask)
        absmax_inv = tl.load(absmax_inv_ptr)
-        output = tl.libdevice.llrint(127.0 * (x * absmax_inv))
+        output = tl.extra.cuda.libdevice.rint(127.0 * (x * absmax_inv))


Same issue here: should be tl.extra.cuda.libdevice.llrint (not rint) to match the original semantics.

matthewdouglas · 2026-02-20T19:12:38Z

Closing in favor of #1871 which removes this functionality instead.

root and others added 2 commits September 26, 2024 11:38

replace tl.libdevice.llrint with tl.extra.cuda.libdevice.rint

84e188e

Merge branch 'main' into develop

49ef0c2

TimDettmers reviewed Feb 16, 2026

View reviewed changes

matthewdouglas closed this Feb 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

replace tl.libdevice.llrint with tl.extra.cuda.libdevice.rint#1372

replace tl.libdevice.llrint with tl.extra.cuda.libdevice.rint#1372
bjmsong wants to merge 2 commits intobitsandbytes-foundation:mainfrom
bjmsong:develop

bjmsong commented Sep 26, 2024

Uh oh!

TimDettmers left a comment

Uh oh!

TimDettmers Feb 16, 2026

Uh oh!

TimDettmers Feb 16, 2026

Uh oh!

matthewdouglas commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

bjmsong commented Sep 26, 2024

Uh oh!

TimDettmers left a comment

Choose a reason for hiding this comment

PR Review: #1372 — Replace tl.libdevice.llrint with tl.extra.cuda.libdevice.rint

Uh oh!

TimDettmers Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

TimDettmers Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

matthewdouglas commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants